首页> 外文OA文献 >Using Titles vs. Full-text as Source for Automated Semantic Document Annotation
【2h】

Using Titles vs. Full-text as Source for Automated Semantic Document Annotation

机译:使用标题与全文作为自动语义文档的源   注解

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A significant part of the largest Knowledge Graph today, the Linked Open Datacloud, consists of metadata about documents such as publications, news reports,and other media articles. While the widespread access to the document metadatais a tremendous advancement, it is yet not so easy to assign semanticannotations and organize the documents along semantic concepts. Providingsemantic annotations like concepts in SKOS thesauri is a classical researchtopic, but typically it is conducted on the full-text of the documents. For thefirst time, we offer a systematic comparison of classification approaches toinvestigate how far semantic annotations can be conducted using just themetadata of the documents such as titles published as labels on the Linked OpenData cloud. We compare the classifications obtained from analyzing thedocuments' titles with semantic annotations obtained from analyzing thefull-text. Apart from the prominent text classification baselines kNN and SVM,we also compare recent techniques of Learning to Rank and neural networks andrevisit the traditional methods logistic regression, Rocchio, and Naive Bayes.The results show that across three of our four datasets, the performance of theclassifications using only titles reaches over 90% of the quality compared tothe classification performance when using the full-text. Thus, conductingdocument classification by just using the titles is a reasonable approach forautomated semantic annotation and opens up new possibilities for enrichingKnowledge Graphs.
机译:当今最大的知识图的重要部分,链接的开放数据云,由有关文档的元数据组成,这些文档包括出版物,新闻报道和其他媒体文章。尽管对文档元数据的广泛访问是一个巨大的进步,但是分配语义注释和沿语义概念组织文档并不是那么容易。在SKOS叙词表中提供语义注释,例如概念,这是一个经典的研究课题,但通常在文档的全文中进行。首次,我们提供了分类方法的系统比较,以研究仅使用文档的主题数据(例如在Linked OpenData云上以标签形式发布的标题)进行语义注释的程度。我们将通过分析文档标题获得的分类与通过分析全文获得的语义注释进行比较。除了突出的文本分类基准kNN和SVM外,我们还比较了学习排名和神经网络的最新技术,并重新访问了传统方法Logistic回归,Rocchio和Naive Bayes。结果表明,在我们的四个数据集中,三个数据集的性能与使用全文相比,仅使用标题的分类达到质量的90%以上。因此,仅使用标题进行文档分类是自动语义注释的合理方法,并为丰富知识图谱开辟了新的可能性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号